Combining Shallow and Linguistically Motivated Features in Native Language Identification

نویسندگان

  • Serhiy Bykh
  • Sowmya Vajjala
  • Julia Krivanek
  • Walt Detmar Meurers
چکیده

1. rc. word ng. recurring word-based n-grams 2. rc. OCPOS ng. recurring n-grams, where open class words are replaced by POS tags 3. rc. word dep. rec. word-based dependencies (MATE): a head and all its immediate dependents, ordered as in the sentence Ex: My own experience confirms this fact. ⇒ (my, own, experience); (experience, confirms, fact); (this, fact) 4. rc. func. dep. rec. function-based dependencies: each dependent is replaced by its grammatical function ⇒ (NMOD,NMOD,experience); (SBJ, confirms, OBJ); (NMOD, fact)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of Multi-word Expressions by Combining Multiple Linguistic Information Sources

We propose a framework for using multiple sources of linguistic information in the task of identifying multiword expressions in natural language texts. We define various linguistically motivated classification features and introduce novel ways for computing them. We then manually define interrelationships among the features, and express them in a Bayesian network. The result is a powerful class...

متن کامل

Semantic frames as an anchor representation for sentiment analysis

Current work on sentiment analysis is characterized by approaches with a pragmatic focus, which use shallow techniques in the interest of robustness but often rely on ad-hoc creation of data sets and methods. We argue that progress towards deep analysis depends on a) enriching shallow representations with linguistically motivated, rich information, and b) focussing different branches of researc...

متن کامل

Language Identification in Code-Switching Scenario

This paper describes a CRF based token level language identification system entry to Language Identification in CodeSwitched (CS) Data task of CodeSwitch 2014. Our system hinges on using conditional posterior probabilities for the individual codes (words) in code-switched data to solve the language identification task. We also experiment with other linguistically motivated language specific as ...

متن کامل

Automated Scoring of Picture-based Story Narration

This work investigates linguistically motivated features for automatically scoring a spoken picture-based narration task. Specifically, we build scoring models with features for story development, language use and task relevance of the response. Results show that combinations of these features outperform a baseline system that uses state of the art speechbased features, and that best results ar...

متن کامل

Using linguistically motivated features for paragraph boundary identification

In this paper we propose a machinelearning approach to paragraph boundary identification which utilizes linguistically motivated features. We investigate the relation between paragraph boundaries and discourse cues, pronominalization and information structure. We test our algorithm on German data and report improvements over three baselines including a reimplementation of Sporleder & Lapata’s (...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013